The Lack of Cross-Validation Can Lead to Inflated Results and Spurious Conclusions: A Re-Analysis of the MacArthur Violence Risk Assessment Study
نویسندگان
چکیده
Cross-validation is an important evaluation strategy in behavioral predictive modeling; without it, a predictive model is likely to be overly optimistic. Statistical methods have been developed that allow researchers to straightforwardly cross-validate predictive models by using the same data employed to construct the model. In the present study, cross-validation techniques were used to construct several decision-tree models with data from the MacArthur Violence Risk Assessment Study (Monahan et al., 2001). The models were then compared with the original (non-cross-validated) Classification of Violence Risk assessment tool. The results show that the measures of predictive model accuracy (AUC, misclassification error, sensitivity, specificity, positive and negative predictive values) degrade considerably when applied to a testing sample, compared with the training sample used to fit the model initially. In addition, unless false negatives (that is, incorrectly predicting individuals to be nonviolent) are considered more costly than false positives (that is, incorrectly predicting individuals to be violent), the models generally make few predictions of violence. The results suggest that employing cross-validation when constructing models can make an important contribution to increasing the reliability and replicability of psychological research. LACK OF CROSS-VALIDATION 3 The Lack of Cross-Validation Can Lead to Inflated Results and Spurious Conclusions: A Re-Analysis of the MacArthur Violence Risk Assessment Study Cross-validation is an important part of constructing a behavioral predictive model. A failure to cross-validate may lead to inflated and overly-optimistic results, as Meehl and Rosen (1955) noted some sixty years ago: “If a psychometric instrument is applied solely to the criterion groups from which it was developed, its reported validity and efficiency are likely to be spuriously high” (p. 194). The Classification of Violence Risk (COVR; Monahan et al., 2001) assessment tool is an actuarial device designed to predict the risk of violence in psychiatric patients. The COVR is a computer-implemented program based on a classification tree construction method that has been praised for its “ease of administration” (McDermott, Dualan, & Scott, 2011, p. 4). When constructed, however, the COVR was not cross-validated; thus, the results from the construction sample may be overly optimistic (for example, see McCusker, 2007). The research presented in this paper reanalyzes data from the MacArthur Violence Risk Assessment Study (VRAS) used to develop the COVR. We begin by describing a widely-applied method for cross-validation, commonly called K-fold cross-validation. Data are then presented from the MacArthur VRAS. Several classification tree models are built from the VRAS dataset demonstrating the importance of cross-validation. In addition, we show how differing cutscores (see Appendix A) implicitly affect the costs associated with false negatives and positives. The COVR implicitly assumes that false negatives (incorrect classifications of violent individuals) are more costly than false positives (incorrect classifications of nonviolent individuals). A Brief Introduction to Cross-Validation Cross-validation is an important tool for prediction, allowing the researcher to estimate the accuracy of a prediction tool in practice. Assessing the accuracy of a model with the same data used to create the model will give overly optimistic estimates of LACK OF CROSS-VALIDATION 4 accuracy because a model is typically fit by minimizing some measure of inaccuracy; thus, the model reflects both the true data pattern as well as error. Cross-validation is a strategy to separate these two entities. Assume we have a dataset (X,y), where X is an n× p matrix containing n observations measured across p predictor variables, and y is an n× 1 vector containing n observations measured on a single outcome variable (for example, the outcome of whether an act of violence was committed). In this scenario the outcome variable is known, and the construction of a model to predict the known values of y is typically referred to as supervised learning. In prediction, interests generally center on modeling y as a function of X, where it is assumed that for some function f ,
منابع مشابه
Workplace Violence: A Regional Survey in Iranian Hospitals’ Emergency Departments
Background and Objectives: Violence toward healthcare workers has emerged as an important health problem. This type of violence has the potential to severely influence healthcare workers, patients, and the community. This study aimed to explore the prevalence of violence in emergency departments, and to identify associated risk factors using a sample of emergency department healthcare workers ...
متن کاملبررسی آمادگی در برابر همهگیری وبای التور در شهرستان مریوان
Introduction: In any health system, various measures have been taken to prevent occurrence and spread of epidemic outbreaks .preplanning and preparing for any crisis is very important in pre-cirsis phase of management and control of any epidemic situation .in October 2012, afterEltor Cholera outbreak in Iraq’s Kurdistan province, Iran’s Ministry of Health and Medical Education (MOHME) have alte...
متن کاملOnm-5: Domestic Violence in Iranian Infertile Women
Background Millions of men and women suffer from infertility worldwide. In many cultures, infertile women are at risk of social and emotional problems. Infertility may affect the public health in many countries. Domestic violence is the intentional use of physical force, power or threat against oneself, another person or another group or community which leads to injury, death, mental harm, lack...
متن کاملDevelopment and Validation of a Short Form Questionnaire to Measuring Wife Abuse
Expended Abstract Introduction: Domestic violence is one of the most common social deviances worldwide. Domestic violence includes child abuse, elder abuse, and spouse abuse (Moreno et al., 2005; Dalal & Lindquist, 2012). One of the most important forms of violence in the family is caused by men exercising their social or physical power against women. However, this problem is an integral part...
متن کاملDevelopment and Validation of a Short Form Questionnaire to Measuring Wife Abuse
Expended Abstract Introduction: Domestic violence is one of the most common social deviances worldwide. Domestic violence includes child abuse, elder abuse, and spouse abuse (Moreno et al., 2005; Dalal & Lindquist, 2012). One of the most important forms of violence in the family is caused by men exercising their social or physical power against women. However, this problem is an integral part...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016